Natural Language Processing as a Source of Linguistic Knowledge
نویسنده
چکیده
1 The paper discusses a number of specific problems of natural text parsing that emerge during the operation of a highly developed rule-based machine translation system, ETAP-3. Emphasis is laid on two classes of problems: 1) adequacy of linguistic description of the working languages of the MT system and 2) means of resolving lexical and syntactic ambiguity of the source text. It is claimed that no parser, however sophisticated or advanced, can be made entirely free of lacunae and gaps. The reason is that many of the linguistic facts, including those critical for parser operation, have never come into view of researchers simply because they have not had at their disposal mass material of unexpected or incorrect parsing. It is exactly such material that is amply provided by a highly developed NLP system. If handled properly, this feedback helps the researcher to find the gaps of scientific descriptions and eliminate them. Consequently, linguistic experimentation with NLP systems becomes a rightful and very promising scientific method. In a way, linguistic applications start to stimulate theoretical research, thus inverting the situation that has existed ever since NLP came to life. The paper deals with a number of instructive cases that opened up in the course of experimental operation of the Russian-to-English automatic translation module of a high-level multifunctional NLP system, ETAP-3, developed by a Moscow research team [1-3]. The system, largely based on the Meaning Û Text theory by Igor Melčuk [4], makes use of dependency syntax: the syntactic structure of any sentence is represented as a dependency tree whose nodes correspond to all words of the sentence and whose arcs are labeled with names of one of several dozens syntactic relations. The method of syntactic representation will be essential in the following account. All cases were evolving in a fairly similar way: the MT module was offered Russian sentences for translation, for which it yielded unsatisfactory English equivalents. Normally, the sentences came from current Russian Internet news sites. The system's operation was subsequently subjected to severe scrutiny, which enabled the experimenters to locate the errors and correct them if at all possible. In the reverse case, when any of the errors proved incorrigible, the experimenters could make a step towards establishing natural limits to machine translation performance. Below, I will consider some representative situations in which unsatisfactory performance of the NLP system has led to important theoretical findings. It can readily …
منابع مشابه
The Source of Human Knowledge: Plato’s problem and Orwell’s problem
Chomsky cannot help wondering at the fact that we, despite so vast evidence, have little knowledge about the obvious evidence. A good example, I think, is the child’s way of first language acquisition. A great many researchers have studied various aspects of child language acquisition at different stages of the child’ life and have brought to light many details of language development. However,...
متن کاملThe Source of Human Knowledge: Plato’s problem and Orwell’s problem
Chomsky cannot help wondering at the fact that we, despite so vast evidence, have little knowledge about the obvious evidence. A good example, I think, is the child’s way of first language acquisition. A great many researchers have studied various aspects of child language acquisition at different stages of the child’ life and have brought to light many details of language development. However,...
متن کاملLinguistic Simulation of Semantic Invariants for Multilingual Knowledge Management Systems
Introduction The problem addressed in this paper is establishment of semantic invariants to serve as a kind of metalanguage of ”senses” valid for the natural language systems under consideration. We see the key objective of natural language processing in developing multilingual facilities of computer access to knowledge contained in texts. Our experience in design and implementation of natural ...
متن کاملA Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملAutomating Feature Set Selection for Case-Based Learning of Linguistic Knowledge
This paper addresses the issue of "algorithm vs. representation" for case-based learning of linguistic knowledge. We first present empirical evidence that the success of case-based learning methods for natural language processing tasks depends to a large degree on the feature set used to describe the training instances. Next, we present a technique for automating feature set selection for case-...
متن کاملNatural language processing for documentation analysis
In view of the increasing interest in ontologies as a source of world knowledge, this deliverable presents different types of ontologies and describes the approach adopted within the Klase project towards the problem of mapping specialized linguistic ontologies to generic resources. It reports on investigations related to the possibility of applying linguistic ontologies to the problem of inter...
متن کامل